Preprocessing and Integration of Data from Multiple Sources for Knowledge Discovery

نویسندگان

  • Marion G. Ceruti
  • Magdi N. Kamel
چکیده

The explosive growth in the generation and collection of data has generated an urgent need for a new generation of techniques and tools that can assist in transforming these data intelligently and automatically into useful knowledge. Knowledge discovery is an emerging multidisciplinary field that attempts to fulfill this need. Knowledge discovery is a large process that includes data selection, cleaning, preprocessing, integration, transformation and reduction, data mining, model selection, evaluation and interpretation, and finally consolidation and use of the extracted knowledge. This paper addresses the issues of data cleaning and integration for knowledge discovery by proposing a systematic approach for resolving semantic conflicts that are encountered during the integration of data from multiple sources. Illustrated with examples derived from military databases, the paper presents a heuristics-based algorithm for identifying and resolving semantic conflicts at different levels of information granularity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining for Web Personalization

In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web. This view of the personaliza...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

Unstructured information integration through data-driven similarity discovery

Information integration from multiple heterogeneous sources is one of the major challenges facing enterprises and service providers today, and one of the important problems in this domain is the integration of structured and unstructured (or text) data. In this paper we describe our work on a data-driven approach to integrating various sources of text data, without relying on the availability o...

متن کامل

A Tool for Support of the Kdd Process

This paper presents basic ideas and results of the GOAL project focusing on its knowledge discovery part. Within this project a KDD (Knowledge Discovery in Databases) package [7] has been designed and implemented. In this paper motivation, architecture, functionality and one two of the implemented DM (data mining) modules of the KDD Package are described in greater detail. KDD Package supports ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International Journal on Artificial Intelligence Tools

دوره 8  شماره 

صفحات  -

تاریخ انتشار 1999